Self-Learning for Few-Shot Remote Sensing Image Captioning
نویسندگان
چکیده
Large-scale caption-labeled remote sensing image samples are expensive to acquire, and the training available in practical application scenarios generally limited. Therefore, caption generation tasks will inevitably fall into dilemma of few-shot, resulting poor qualities generated text descriptions. In this study, we propose a self-learning method named SFRC for few-shot captioning. Without relying on additional labeled external knowledge, improves performance by ameliorating way efficiency learning limited data. We first train an encoder semantic feature extraction using supplemental modified BYOL self-supervised small number unlabeled samples, where derived from samples. When model self-ensemble yields parameter-averaging teacher based integration intermediate morphologies over certain time horizon. The self-distillation uses self-ensemble-obtained generate pseudo labels guide student next achieve better performance. Additionally, when optimizing parameter back-propagation, design baseline incorporating self-critical reduce variance during gradient computation weaken effect overfitting. range experiments only evaluation metric scores exceed those recent methods. conduct percentage sampling test captioning with fewer also ablation key designs SFRC. results prove that these sparse sample indeed fruitful, each contributes method.
منابع مشابه
Deep Self-taught Learning for Remote Sensing Image Classification
This paper addresses the land cover classification task for remote sensing images by deep self-taught learning. Our selftaught learning approach learns suitable feature representations of the input data using sparse representation and undercomplete dictionary learning. We propose a deep learning framework which extracts representations in multiple layers and use the output of the deepest layer ...
متن کاملContrastive Learning for Image Captioning
Image captioning, a popular topic in computer vision, has achieved substantial progress in recent years. However, the distinctiveness of natural descriptions is often overlooked in previous work. It is closely related to the quality of captions, as distinctive captions are more likely to describe images with their unique aspects. In this work, we propose a new learning method, Contrastive Learn...
متن کاملFew-shot Learning
Though deep neural networks have shown great success in the large data domain, they generally perform poorly on few-shot learning tasks, where a classifier has to quickly generalize after seeing very few examples from each class. The general belief is that gradient-based optimization in high capacity classifiers requires many iterative steps over many examples to perform well. Here, we propose ...
متن کاملStack-Captioning: Coarse-to-Fine Learning for Image Captioning
The existing image captioning approaches typically train a one-stage sentence decoder, which is difficult to generate rich fine-grained descriptions. On the other hand, multi-stage image caption model is hard to train due to the vanishing gradient problem. In this paper, we propose a coarse-to-fine multistage prediction framework for image captioning, composed of multiple decoders each of which...
متن کاملPrototypical Networks for Few-shot Learning
A recent approach to few-shot classification called matching networks has demonstrated the benefits of coupling metric learning with a training procedure that mimics test. This approach relies on an attention scheme that forms a distribution over all points in the support set, scaling poorly with its size. We propose a more streamlined approach, prototypical networks, that learns a metric space...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Remote Sensing
سال: 2022
ISSN: ['2315-4632', '2315-4675']
DOI: https://doi.org/10.3390/rs14184606